Binary (Logistic) Regression with Two Quantitative Predictors

To illustrate a regression model with two quantitative predictors, we will develop a logistic regression model for obesity as a function of age (adults 18-79) and household income (up to $100,000) for the US population. We will treat household income, an ordinal variable, as a quantitative variable coded using the midpoint values for each category.

Age and household income are quantitative variables. As we’ve discussed, we typically model the effects of quantitative variables using natural cubic splines. In formulating our initial model, we need to specify (1) the number of parameters (df) (complexity or degree of nonlinearity) for the age effect, (2) the number of parameters (df) (complexity or degree of nonlinearity) for the income effect, and (3) whether to assume additivity or to allow for possible interaction between age and income.

As we’ve discussed, a natural cubic spline with 4 df typically represents an appropriate balance between flexibility and loss of precision caused by overfitting. Small samples (\(N < 30\)) may require the use of 2 df, and moderate samples (\(N < 100\)) may require the use of 3 df. Empirical evidence suggests that more than 4 df are seldom required in a natural cubic spline regression model. Cubic spline models with 5 or 6 df would typically only be considered if the sample size was extremely large.

Based on these considerations, we will represent age and income using a natural cubic spline with 4 df.

As discussed previously for two categorical predictors, the choice of an additive model or an interaction model depends primarily on the scientific context and, to a lesser extent, on the available sample size. In smaller samples, the number of parameters \(p\) (regression coefficients) that can be reliably estimated is \(m/15\), where \(m\) is the effective sample size (for logistic regression, \(m\) is the minimum of the number of events and the number of non-events in our sample).

In this specific case, in my opinion, an additive model for the effects of age and income is potentially reasonable.

So, our initial model is an additive logistic regression model for prevalence of obesity as a function of age (represented as a natural cubic spline with 4 df) and income (represented as a natural cubic spline with 4 df).

Additive Model

We denote the obesity outcome for subject \(i\) by \(y_i\), the age for subject \(i\) by \(x_{1i}\), and the income for subject \(i\) by \(x_{2i}\).

An additive logistic regression model for (prevalence of) obesity as a function of age and income is \[ y_i \sim \mbox{Bernoulli}\left\{p_i\right\} \\ \mbox{logit}\left\{p_i\right\} = \log{\left\{\frac{p_i}{1-p_i}\right\}} = \alpha_0 + f_{a}(x_{1i},{\bf \beta}) + f_{i}(x_{2i},{\bf \gamma})\]

For modeling, age will be represented by a basis for the natural cubic spline with 4 df (5 knots), e.g. \(a_1(x_{2}), a_2(x_{2}), a_3(x_{2}), a_4(x_{2})\).

In our regression model, \[f_{a}(x_{2},{\bf \gamma}) = \gamma_1 a_1(x_{2}) + \gamma_2 a_2(x_{2}) + \gamma_3 a_3(x_{2}) + \gamma_4 a_4(x_{2})\]

For modeling, income will be represented by a basis for the natural cubic spline with 4 df (5 knots), e.g. \(i_1(x_{2}), i_2(x_{2}), i_3(x_{2}), i_4(x_{2})\).

In our regression model, \[f_{i}(x_{2},{\bf \gamma}) = \gamma_1 i_1(x_{2}) + \gamma_2 i_2(x_{2}) + \gamma_3 i_3(x_{2}) + \gamma_4 i_4(x_{2})\]

LIBNAME NHANES "../../../BMI552_Datasets/NHANES_May2023/";
*LIBNAME NHANES "~/my_shared_file_links/u48718377/NHANES";
OPTIONS FMTSEARCH=(NHANES.formats_NHANES work) NODATE NONUMBER;
TITLE1; TITLE2;
ODS NOPROCTITLE;
ODS GRAPHICS / LOESSMAXOBS=20000;

DATA Ron;
  SET NHANES.NHANES;
RUN;

ODS SELECT NONE;
PROC LOGISTIC DATA=Ron;
  ODS SELECT ResponseProfile;
*  ODS SELECT SplineKnots;
  EFFECT AgeCS=SPLINE(Age / NATURALCUBIC BASIS=TPF(NOINT) DEGREE=3 
                            KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95) DETAILS);
  EFFECT IncomeCS=SPLINE(HHIncomeMid / NATURALCUBIC BASIS=TPF(NOINT) DEGREE=3 
                            KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95) DETAILS);
    
  MODEL Obese(EVENT="Yes") = AgeCS IncomeCS / LINK=LOGIT;

  OUTPUT OUT=Tmp RESCHI=Residual PREDICTED=Fitted; 

  STORE AgeIncome;
RUN;
ODS SELECT ALL;
Response Profile
Ordered
Value
Obese Total
Frequency
1 No 4643
2 Yes 2592

Probability modeled is Obese='Yes'.

Assessing Adequacy of Modeling Assumptions

Before using the model, we must evaluate the adequacy of the modeling assumptions. For logistic regression models, the only assumptions are correct structural model and independence (which cannot be evaluated based on the observed data). We can identify potential inadequacies in the representation of the effects of age or income, e.g. insufficient df, or the assumption of additivity either using informal plots of the residuals against age by income or income by age or formal comparisons of fit (AIC or BIC) for models with more df devoted to the age effect or models with interaction terms.

PROC FORMAT;
  VALUE IncomeGrp
    LOW-19999="0-19999"
    20000-39999="20000-39999"
    40000-59999="40000-59999"
    60000-79999="60000-79999"
    80000-99999="80000-99999"
    ;

  VALUE AgeGrp
    LOW-29="18-29"
    30-39="30-39"
    40-49="40-49"
    50-59="50-59"
    60-69="60-69"
    70-79="70-79"
    ;
RUN;
        
PROC SGPANEL DATA=Tmp;
  FORMAT HHIncomeMid IncomeGrp.;
  LABEL HHIncomeMid="Household Income";
  PANELBY HHIncomeMid / LAYOUT=ROWLATTICE;
    REFLINE 0 / AXIS=Y;
  LOESS X=Age Y=Residual / JITTER; 
RUN;

PROC SGPANEL DATA=Tmp;
  FORMAT Age AgeGrp.;
  PANELBY Age / LAYOUT=ROWLATTICE;
    REFLINE 0 / AXIS=Y;
  LOESS X=HHIncomeMid Y=Residual / JITTER; 
RUN;
The SGPanel Procedure Loess Pearson Residual 20 30 40 50 60 70 80 Age (yrs) -1 0 1 2 -1 0 1 2 Household Income = 20000-39999 Household Income = 0-19999
The SGPanel Procedure Loess Pearson Residual 20 30 40 50 60 70 80 Age (yrs) -1 0 1 2 -1 0 1 2 Household Income = 60000-79999 Household Income = 40000-59999
The SGPanel Procedure Loess Pearson Residual 20 30 40 50 60 70 80 Age (yrs) -1 0 1 2 -1 0 1 2 Household Income = 100000 Household Income = 80000-99999

The SGPanel Procedure Loess 0 20000 40000 60000 80000 100000 Household Income (Mid-Point of Category) -1 0 1 2 Pearson Residual Age (yrs) = 18-29
The SGPanel Procedure Loess 0 20000 40000 60000 80000 100000 Household Income (Mid-Point of Category) -1 0 1 2 Pearson Residual Age (yrs) = 30-39
The SGPanel Procedure Loess 0 20000 40000 60000 80000 100000 Household Income (Mid-Point of Category) -1 0 1 2 Pearson Residual Age (yrs) = 40-49
The SGPanel Procedure Loess 0 20000 40000 60000 80000 100000 Household Income (Mid-Point of Category) -1 0 1 2 Pearson Residual Age (yrs) = 50-59
The SGPanel Procedure Loess 0 20000 40000 60000 80000 100000 Household Income (Mid-Point of Category) -1 0 1 2 Pearson Residual Age (yrs) = 60-69
The SGPanel Procedure Loess 0 20000 40000 60000 80000 100000 Household Income (Mid-Point of Category) -1 0 1 2 Pearson Residual Age (yrs) = 70-79
The SGPanel Procedure Loess 0 20000 40000 60000 80000 100000 Household Income (Mid-Point of Category) -1 0 1 2 Pearson Residual Age (yrs) = 80

The lack of any consistent patterns in these plots suggests that the correct structural model assumption is valid.

The effective sample size is \(2,592\).

ODS EXCLUDE ALL;
PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=DF4;
  EFFECT AgeCS=SPLINE(Age / 
      NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95));
  EFFECT IncomeCS=SPLINE(HHIncomeMid / 
       NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95));
    
  MODEL Obese(EVENT="Yes") = AgeCS IncomeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=DF5;
  EFFECT AgeCS=SPLINE(Age / 
      NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));
  EFFECT IncomeCS=SPLINE(HHIncomeMid / 
       NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=EQUAL(6));
    
  MODEL Obese(EVENT="Yes") = AgeCS IncomeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=DF6;
  EFFECT AgeCS=SPLINE(Age / 
      NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(2.5 18.3333 34.1667 50 65.8333 81.6667 97.5));
  EFFECT IncomeCS=SPLINE(HHIncomeMid / 
       NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=EQUAL(7));
    
  MODEL Obese(EVENT="Yes") = AgeCS IncomeCS / LINK=LOGIT;
RUN;


PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=Interaction;
  EFFECT AgeCS=SPLINE(Age / 
      NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95));
  EFFECT IncomeCS=SPLINE(HHIncomeMid / 
       NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95));
    
  MODEL Obese(EVENT="Yes") = AgeCS IncomeCS AgeCS*IncomeCS / LINK=LOGIT;
RUN;

DATA DF4;
  SET DF4;
  WHERE Test="Likelihood Ratio";
    
  Model = "Age+Income 4df";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;    
RUN;

DATA DF5;
  SET DF5;
  WHERE Test="Likelihood Ratio";
    
  Model = "Age+Income 5df";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;    
RUN;

DATA DF6;
  SET DF6;
  WHERE Test="Likelihood Ratio";
    
  Model = "Age+Income 6df";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;    
RUN;

DATA Interaction;
  SET Interaction;
  WHERE Test="Likelihood Ratio";
    
  Model = "Interaction";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;    
RUN;

DATA ModelComparisons;
  LENGTH Model $20;

  SET DF4 DF5 DF6 Interaction;
    
  KEEP Model DF AIC BIC;    
RUN;

PROC MEANS DATA=ModelComparisons MAX NOPRINT;
  VAR AIC BIC;
  OUTPUT OUT=MaxIC MAX= / AUTONAME;
RUN;

DATA ModelComparisons;
  IF _N_=1 THEN SET MaxIC;
  SET ModelComparisons;
    
  AIC = AIC_Max - AIC;
  BIC = BIC_Max - BIC;
    
  KEEP Model DF AIC BIC;
RUN;
ODS EXCLUDE NONE;

TITLE "Model Comparisons (AIC/BIC)";
PROC PRINT DATA=ModelComparisons NOOBS;
  VAR Model DF AIC BIC;  
    
  FORMAT AIC BIC 6.1;
RUN;
TITLE;

Model Comparisons (AIC/BIC)

Model DF AIC BIC
Age+Income 4df 8 13.3 0.0
Age+Income 5df 10 6.3 4.7
Age+Income 6df 12 10.6 20.7
Interaction 24 0.0 80.4

Based on AIC, there is conclusive evidence for the interaction model, while, based on BIC, there is conclusive evidence for the additive model. In most situations, I will use BIC, which more strongly favors the pre-specified model, rather than AIC, which makes it easier to switch to a more complex model. In practice, the choice of AIC or BIC should be made prior to examining the data.

Odds Ratios for Age Adjusted for Income

ESTIMATE statements can be used to obtain the odds ratios comparing various pairs of ages adjusted for income.

ODS EXCLUDE ALL;
PROC PLM RESTORE=AgeIncome;
  ODS OUTPUT Estimates=E;

  ESTIMATE "Age 30 v 20" AgeCS [1,30] [-1, 20] / CL EXP ALPHA=0.03125;
  ESTIMATE "Age 40 v 30" AgeCS [1,40] [-1, 30] / CL EXP ALPHA=0.03125;
  ESTIMATE "Age 50 v 40" AgeCS [1,50] [-1, 40] / CL EXP ALPHA=0.03125;
  ESTIMATE "Age 60 v 50" AgeCS [1,60] [-1, 50] / CL EXP ALPHA=0.03125;
  ESTIMATE "Age 70 v 60" AgeCS [1,70] [-1, 60] / CL EXP ALPHA=0.03125;
  ESTIMATE "Age 80 v 70" AgeCS [1,80] [-1, 70] / CL EXP ALPHA=0.03125;
  ESTIMATE "Age 30 v 50" AgeCS [1,30] [-1, 50] / CL EXP ALPHA=0.03125;
  ESTIMATE "Age 70 v 50" AgeCS [1,70] [-1, 50] / CL EXP ALPHA=0.03125;
RUN;
ODS EXCLUDE NONE;

PROC PRINT DATA=E NOOBS LABEL;
  VAR Label LowerExp ExpEstimate UpperExp;
  LABEL Label='00'x
      LowerExp="Lower CL"
      ExpEstimate="Estimate"
      UpperExp="Upper CL";
  TITLE "Odds Ratios for Age Adjusted for Income";
RUN;
TITLE;

Odds Ratios for Age Adjusted for Income

  Lower CL Estimate Upper CL
Age 30 v 20 1.4595 1.8479 2.3397
Age 40 v 30 0.9733 1.0794 1.1970
Age 50 v 40 0.8747 0.9697 1.0750
Age 60 v 50 1.0455 1.1980 1.3727
Age 70 v 60 0.7545 0.8247 0.9014
Age 80 v 70 0.5520 0.6558 0.7791
Age 30 v 50 0.8048 0.9554 1.1341
Age 70 v 50 0.8474 0.9879 1.1518

Adjusted for income, the odds ratios for obesity are \((1.46,2.34)\) comparing age 30 to age 20, \((0.97,1.20)\) comparing age 40 to age 30, \((0.87,1.08)\) comparing age 50 to age 40, \((1.05,1.37)\) comparing age 60 to age 50, \((0.75,0.90)\) comparing age 70 to age 60, and \((0.55,0.78)\) comparing age 80 to age 70.

Odds Ratios for Income Adjusted for Age

ESTIMATE statements can be used to obtain the odds ratios comparing various pairs of incomes adjusted for age.

ODS EXCLUDE ALL;
PROC PLM RESTORE=AgeIncome;
  ODS OUTPUT Estimates=E;

  ESTIMATE "Income $7,500 v $2,500" IncomeCS [1, 7500] [-1, 2500] / CL EXP ALPHA=0.03125;
  ESTIMATE "Income $12,500 v $7,500" IncomeCS [1, 12500] [-1, 7500] / CL EXP ALPHA=0.03125;
  ESTIMATE "Income $17,500 v $12,500" IncomeCS [1, 17500] [-1, 12500] / CL EXP ALPHA=0.03125;
  ESTIMATE "Income $22,500 v $17,500" IncomeCS [1, 22500] [-1, 175500] / CL EXP ALPHA=0.03125;
  ESTIMATE "Income $30,000 v $22,500" IncomeCS [1, 30000] [-1, 22500] / CL EXP ALPHA=0.03125;
  ESTIMATE "Income $40,000 v $30,000" IncomeCS [1, 40000] [-1, 30000] / CL EXP ALPHA=0.03125;
  ESTIMATE "Income $50,000 v $40,000" IncomeCS [1, 50000] [-1, 40000] / CL EXP ALPHA=0.03125;
  ESTIMATE "Income $60,000 v $50,000" IncomeCS [1, 60000] [-1, 50000] / CL EXP ALPHA=0.03125;
  ESTIMATE "Income $70,000 v $60,000" IncomeCS [1, 70000] [-1, 60000] / CL EXP ALPHA=0.03125;
  ESTIMATE "Income $87,500 v $70,000" IncomeCS [1, 87500] [-1, 70000] / CL EXP ALPHA=0.03125;
RUN;
ODS EXCLUDE NONE;

PROC PRINT DATA=E NOOBS LABEL;
  VAR Label LowerExp ExpEstimate UpperExp;
  LABEL Label='00'x
      LowerExp="Lower CL"
      ExpEstimate="Estimate"
      UpperExp="Upper CL";
  TITLE "Odds Ratios for Income Adjusted for Age";
RUN;
TITLE;

Odds Ratios for Income Adjusted for Age

  Lower CL Estimate Upper CL
Income $7,500 v $2,500 0.9506 1.0125 1.0784
Income $12,500 v $7,500 0.9506 1.0125 1.0784
Income $17,500 v $12,500 0.9511 1.0114 1.0756
Income $22,500 v $17,500 3.8961 14.3206 52.6364
Income $30,000 v $22,500 0.9347 0.9810 1.0296
Income $40,000 v $30,000 0.8468 0.9192 0.9977
Income $50,000 v $40,000 0.8445 0.9241 1.0112
Income $60,000 v $50,000 0.9369 0.9948 1.0563
Income $70,000 v $60,000 0.9364 1.0246 1.1211
Income $87,500 v $70,000 0.8188 0.9009 0.9913

Adjusted for age, the odds ratios for obesity are \((0.95,1.08)\) comparing a household income of $7,500 to $2,500, \((0.95,1.08)\) comparing a household income of $17,500 to $12,500, \((0.93,1.03)\) comparing a household income of $30,000 to $22,500, \((0.84,1.01)\) comparing a household income of $50,000 to $40,000, and \((0.94,1.12)\) comparing a household income of $70,000 to $60,000.

Odds Ratio for Arbitrary Age-Income Combinations

ESTIMATE statements can be used to obtain the odds ratios comparing any two combinations of age and income.

ODS EXCLUDE ALL;
PROC PLM RESTORE=AgeIncome;
  ODS OUTPUT Estimates=E;

  ESTIMATE "Age 30, Income $22,500 v Age 50, Income $87,500" AgeCS [1, 30] [-1, 50]
      IncomeCS [1, 22500] [-1, 87500] / CL EXP ALPHA=0.03125;
  ESTIMATE "Age 30, Income $50,000 v Age 50, Income $87,500" AgeCS [1, 30] [-1, 50]
      IncomeCS [1, 50000] [-1, 87500] / CL EXP ALPHA=0.03125;
  ESTIMATE "Age 70, Income $22,500 v Age 50, Income $87,500" AgeCS [1, 70] [-1, 50]
      IncomeCS [1, 22500] [-1, 87500] / CL EXP ALPHA=0.03125;
  ESTIMATE "Age 70, Income $50,000 v Age 50, Income $87,500" AgeCS [1, 70] [-1, 50]
      IncomeCS [1, 50000] [-1, 87500] / CL EXP ALPHA=0.03125;
RUN;
ODS EXCLUDE NONE;

PROC PRINT DATA=E NOOBS LABEL;
  VAR Label LowerExp ExpEstimate UpperExp;
  LABEL Label='00'x
      LowerExp="Lower CL"
      ExpEstimate="Estimate"
      UpperExp="Upper CL";
  TITLE "Odds Ratios";
RUN;
TITLE;

Odds Ratios

  Lower CL Estimate Upper CL
Age 30, Income $22,500 v Age 50, Income $87,500 0.9940 1.2485 1.5683
Age 30, Income $50,000 v Age 50, Income $87,500 0.7944 1.0404 1.3626
Age 70, Income $22,500 v Age 50, Income $87,500 1.0456 1.2910 1.5940
Age 70, Income $50,000 v Age 50, Income $87,500 0.8335 1.0758 1.3886

The odds ratios for obesity are \((0.99,1.57)\) comparing 30-year-olds with a household income of $22,500 to 50-year-olds with a household income of $87,500, \((0.79,1.36)\) comparing 30-year-olds with a household income of $50,000 to 50-year-olds with a household income of $87,500, \((1.05,1.59)\) comparing 70-year-olds with a household income of $22,500 to 50-year-olds with a household income of $87,500, and \((0.83,1.39)\) comparing 70-year-olds with a household income of $50,000 to 50-year-olds with a household income of $87,500.

Prevalence by Age and Income

ESTIMATE statements can be used to obtain the prevalence of obesity for any combination of age and income. Note that, in the context of the current model, the prevalence of obesity cannot be obtained for a specific age without also specifying a specific income (and vice versa). There is no single prevalence of obesity for 50 year olds.

ODS EXCLUDE ALL;
PROC PLM RESTORE=AgeIncome;
  ODS OUTPUT Estimates=E;

  ESTIMATE "Age 30, Income $7,500"  Intercept 1 AgeCS [1, 30] IncomeCS [1, 7500] 
      / CL ILINK ALPHA=0.03125;
  ESTIMATE "Age 50, Income $7,500"  Intercept 1 AgeCS [1, 50] IncomeCS [1, 7500] 
       / CL ILINK ALPHA=0.03125;
  ESTIMATE "Age 70, Income $7,500"  Intercept 1 AgeCS [1, 70] IncomeCS [1, 7500] 
      / CL ILINK ALPHA=0.03125;
  ESTIMATE "Age 30, Income $70,000"  Intercept 1 AgeCS [1, 30] IncomeCS [1, 70000] 
      / CL ILINK ALPHA=0.03125;
  ESTIMATE "Age 50, Income $70,000"  Intercept 1 AgeCS [1, 50] IncomeCS [1, 70000] 
      / CL ILINK ALPHA=0.03125;
  ESTIMATE "Age 70, Income $70,000"  Intercept 1 AgeCS [1, 70] IncomeCS [1, 70000] 
      / CL ILINK ALPHA=0.03125;
RUN;
ODS EXCLUDE NONE;

PROC PRINT DATA=E NOOBS LABEL;
  VAR Label LowerMu Mu UpperMu;
  LABEL Label='00'x
      Mu="Estimate"
      LowerMu="Lower CL"
      UpperMu="Upper CL";
  TITLE "Prevalence of Obesity (Additive)";
RUN;
TITLE;

Prevalence of Obesity (Additive)

  Lower CL Estimate Upper CL
Age 30, Income $7,500 0.3611 0.4066 0.4537
Age 50, Income $7,500 0.3723 0.4176 0.4645
Age 70, Income $7,500 0.3700 0.4147 0.4608
Age 30, Income $70,000 0.3398 0.3746 0.4106
Age 50, Income $70,000 0.3502 0.3853 0.4216
Age 70, Income $70,000 0.3471 0.3824 0.4191

The estimated prevalence of obesity is \((0.36,0.45)\) for 30-year-olds with a household income of $7,500, \((0.37,0.46)\) for 50-year-olds with a household income of $7,500, and \((0.37,0.46)\) for 70-year-olds with a household income of $7,500; it is \((0.34,0.41)\) for 30-year-olds with a household income of $70,000, \((0.35,0.42)\) for 50-year-olds with a household income of $70,000, and \((0.35,0.42)\) for 70-year-olds with a household income of $70,000.

The estimated regression functions (log odds of obesity as a function of age and race) from the additive model can be plotted using PROC SGPLOT.

ODS EXCLUDE ALL;  
DATA AllAgesAllIncomes;
  DO HHIncomeMid = 2500 TO 87500 BY 2500;
    DO Age = 18 TO 79 BY 1;
      OUTPUT;
    END;
  END;
RUN;

PROC PLM RESTORE=AgeIncome;
  SCORE DATA=AllAgesAllIncomes OUT=Fitted Predicted LCLM=Lower_Logit UCLM=Upper_Logit / ALPHA=0.03125;
RUN;
      
DATA Fitted;
  SET Fitted;
      
  Lower_Prob = 1/(1+EXP(-Lower_Logit));
  Upper_Prob = 1/(1+EXP(-Upper_Logit));
RUN;      
ODS EXCLUDE NONE;

PROC SGPLOT DATA=Fitted;
  STYLEATTRS DATACOLORS=(Cx1b9e77 Cxd95f02 Cx7570b3 Cxe7298a Cx66a61e Cxe6ab02 Cxa6761d Cx666666);
  BAND X=Age Lower=Lower_Logit Upper=Upper_Logit /
      GROUP=HHIncomeMid TRANSPARENCY=0.7;
  YAXIS LABEL="Logit (Log Odds) of Obesity";
  WHERE HHIncomeMid IN (10000 30000 50000 70000);
RUN;

PROC SGPLOT DATA=Fitted;
  STYLEATTRS DATACOLORS=(Cx1b9e77 Cxd95f02 Cx7570b3 Cxe7298a Cx66a61e Cxe6ab02 Cxa6761d Cx666666);
  BAND X=HHIncomeMid Lower=Lower_Logit Upper=Upper_Logit /
      GROUP=Age TRANSPARENCY=0.7;
  YAXIS LABEL="Logit (Log Odds) of Obesity";
  WHERE Age IN (30 40 50 60 70);
RUN;

Alternatively, the estimated prevalence of obesity can also be plotted using PROC SGPLOT.

PROC SGPLOT DATA=Fitted;
  STYLEATTRS DATACOLORS=(Cx1b9e77 Cxd95f02 Cx7570b3 Cxe7298a Cx66a61e Cxe6ab02 Cxa6761d Cx666666);
  BAND X=Age Lower=Lower_Prob Upper=Upper_Prob /
      GROUP=HHIncomeMid TRANSPARENCY=0.7;
  YAXIS LABEL="Prevance of Obesity";
  WHERE HHIncomeMid IN (10000 30000 50000 70000);
RUN;

PROC SGPLOT DATA=Fitted;
  STYLEATTRS DATACOLORS=(Cx1b9e77 Cxd95f02 Cx7570b3 Cxe7298a Cx66a61e Cxe6ab02 Cxa6761d Cx666666);
  BAND X=HHIncomeMid Lower=Lower_Prob Upper=Upper_Prob /
      GROUP=Age TRANSPARENCY=0.7;
  YAXIS LABEL="Prevalence of Obesity";
  WHERE Age IN (30 40 50 60 70);
RUN;

Hypothesis Tests

Likelihood ratio tests can be used to assess the evidence for (1) any effect of age on prevalence of obesity after accounting for income, (2) any effect of income on prevalence of obesity after accounting for age, (3) non-linearity of the effect(s) of age, and (4) non-linearity of the effect(s) of income. In general, one should not reduce the complexity of the model (drop variables) based on the results of a hypothesis test.

ODS EXCLUDE ALL;
PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=Full;
  EFFECT AgeCS=SPLINE(Age / NATURALCUBIC BASIS=TPF(NOINT) DEGREE=3 
                            KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95) DETAILS);
  EFFECT IncomeCS=SPLINE(HHIncomeMid / NATURALCUBIC BASIS=TPF(NOINT) DEGREE=3 
                            KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95) DETAILS);
    
  MODEL Obese(EVENT="Yes") = AgeCS IncomeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=LinearAge;
  EFFECT IncomeCS=SPLINE(HHIncomeMid / NATURALCUBIC BASIS=TPF(NOINT) DEGREE=3 
                            KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95) DETAILS);
    
  MODEL Obese(EVENT="Yes") = Age IncomeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=LinearIncome;
  EFFECT AgeCS=SPLINE(Age / NATURALCUBIC BASIS=TPF(NOINT) DEGREE=3 
                            KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95) DETAILS);

  MODEL Obese(EVENT="Yes") = AgeCS HHIncomeMid / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=AgeOnly;
  EFFECT AgeCS=SPLINE(Age / NATURALCUBIC BASIS=TPF(NOINT) DEGREE=3 
                            KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95) DETAILS);

  MODEL Obese(EVENT="Yes") = AgeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=IncomeOnly;
  EFFECT IncomeCS=SPLINE(HHIncomeMid / NATURALCUBIC BASIS=TPF(NOINT) DEGREE=3 
                            KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95) DETAILS);
    
  MODEL Obese(EVENT="Yes") = IncomeCS / LINK=LOGIT;
RUN;

DATA Full;
  SET Full;
  WHERE Test="Likelihood Ratio";
  RENAME ChiSq=ChiSq_Full DF=DF_Full;
RUN;

DATA LinearAge;
  LENGTH Model $30 Test $30;
  SET LinearAge;
  WHERE Test="Likelihood Ratio";
    
  Model = "Income+Linear Age";
  Test = "Non-Linear Age";
    
  KEEP Model Test ChiSq DF;
RUN;

DATA LinearIncome;
  LENGTH Model $30 Test $30;
  SET LinearIncome;
  WHERE Test="Likelihood Ratio";
    
  Model = "Age+Linear Income";
  Test = "Non-Linear Income";
    
  KEEP Model Test ChiSq DF;
RUN;

DATA IncomeOnly;
  LENGTH Model $30 Test $30;
  SET IncomeOnly;
  WHERE Test="Likelihood Ratio";
    
  Model = "Income Only";
  Test = "Age";
    
  KEEP Model Test ChiSq DF;
RUN;

DATA AgeOnly;
  LENGTH Model $30 Test $30;
  SET AgeOnly;
  WHERE Test="Likelihood Ratio";
    
  Model = "Age Only";
  Test = "Income";
    
  KEEP Model Test ChiSq DF;
RUN;

DATA LRTests;
  LENGTH Model $30 Test $30;

  IF _N_ = 1 THEN SET Full;
  SET LinearAge LinearIncome AgeOnly IncomeOnly;
    
  Chisq = Chisq_Full-Chisq;
  DF = DF_Full - DF;
  ProbChiSq = EXP(LOGSDF("CHISQUARE",Chisq,DF));
  sValue = -LOG(ProbChiSq)/LOG(2);
    
  KEEP Test DF ChiSq ProbChiSq sValue;    
RUN;
ODS EXCLUDE NONE;

TITLE "Likelihood Ratio Tests";
PROC PRINT DATA=LRTests NOOBS;
  VAR Test DF ChiSq ProbChiSq sValue;  

  FORMAT sValue 6.2;
  FORMAT ChiSq 6.2;
RUN;
TITLE;

Likelihood Ratio Tests

Test DF ChiSq ProbChiSq sValue
Non-Linear Age 3 62.98 <.0001 42.74
Non-Linear Income 3 10.81 0.0128 6.29
Income 4 70.96 <.0001 46.00
Age 4 67.43 <.0001 43.52

There is extremely strong evidence that the prevalence of obesity varies by age \((p \approx 0, s=43.5)\); the relationship between age and the log odds of obesity is non-linear \((p \approx 0, s=42.7)\).

There is very strong evidence that the prevalence of obesity varies by income \((p \approx 0, s=46.0)\); there is some evidence that the relationship between income and the log odds of obesity is non-linear \((p=0.01, s=6.3)\).

Interaction Model

We denote the obesity outcome for subject \(i\) by \(y_i\), the age for subject \(i\) by \(x_{1i}\), and the income for subject \(i\) by \(x_{2i}\).

A logistic regression model for (prevalence of) obesity as a function of age, income and their interaction is \[ y_i \sim \mbox{Bernoulli}\left\{p_i\right\} \\ \mbox{logit}\left\{p_i\right\} = \log{\left\{\frac{p_i}{1-p_i}\right\}} = \alpha_0 + f_{a}(x_{1i},{\bf \beta}) + f_{i}(x_{2i},{\bf \gamma}) + f_{a,i}(x_{1i},x_{2i},{\bf \delta})\]

In our regression model, the interaction term is represented as \[\begin{eqnarray*} f_{a,i}(x_{1}, x_{2}, {\bf \delta}) & = & \delta_{11} a_1(x_{1}) \times i_1(x_{2}) + \delta_{12} a_1(x_{1}) \times i_2(x_{2}) + \delta_{13} a_1(x_{1}) \times i_3(x_{2}) + \delta_{14} a_1(x_{1}) \times i_4(x_{2}) + \\ & & \delta_{21} a_2(x_{1}) \times i_1(x_{2}) + \delta_{22} a_2(x_{1}) \times i_2(x_{2}) + \delta_{23} a_2(x_{1}) \times i_3(x_{2}) + \delta_{24} a_2(x_{1}) \times i_4(x_{2}) + \\ & & \delta_{31} a_3(x_{1}) \times i_1(x_{2}) + \delta_{32} a_3(x_{1}) \times i_2(x_{2}) + \delta_{33} a_3(x_{1}) \times i_3(x_{2}) + \delta_{34} a_4(x_{1}) \times i_3(x_{2}) + \\ & & \delta_{41} a_4(x_{1}) \times i_1(x_{2}) + \delta_{42} a_4(x_{1}) \times i_2(x_{2}) + \delta_{43} a_4(x_{1}) \times i_3(x_{2}) + \delta_{44} a_4(x_{1}) \times i_4(x_{2}) \\ \end{eqnarray*}\]

This representation of the interaction term is based on \(16~=4 \times 4\) products of the 4 variables used to encode age \(a_1(x_1), a_2(x_1), a_3(x_1), a_4(x_1)\) and the 4 variables used to encode income \(i_1(x_2), i_2(x_2), i_3(x_2), i_4(x_2)\); it is called a tensor product spline.

ODS SELECT NONE;
PROC LOGISTIC DATA=Ron;
*  ODS SELECT ResponseProfile;
*  ODS SELECT SplineKnots;
  EFFECT AgeCS=SPLINE(Age / NATURALCUBIC BASIS=TPF(NOINT) DEGREE=3 
                            KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95) DETAILS);
  EFFECT IncomeCS=SPLINE(HHIncomeMid / NATURALCUBIC BASIS=TPF(NOINT) DEGREE=3 
                            KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95) DETAILS);
    
  MODEL Obese(EVENT="Yes") = AgeCS IncomeCS AgeCS*IncomeCS / LINK=LOGIT;

  OUTPUT OUT=Tmp RESCHI=Residual PREDICTED=Fitted; 

  STORE AgeIncome;
RUN;
ODS SELECT ALL;

Assessing Adequacy of Modeling Assumptions

Before using the model, we must evaluate the adequacy of the modeling assumptions. For binary regression models, the only assumptions are correct structural model and independence (which cannot be evaluated based on the observed data). We can identify potential inadequacies in the representation of age and/or income, e.g. insufficient df, either using informal plots of the residuals against age by income or income by age or formal comparisons of fit (AIC or BIC) for models with more df devoted to the age effect.

PROC FORMAT;
  VALUE IncomeGrp
    LOW-19999="0-19999"
    20000-39999="20000-39999"
    40000-59999="40000-59999"
    60000-79999="60000-79999"
    80000-99999="80000-99999"
    ;

  VALUE AgeGrp
    LOW-29="18-29"
    30-39="30-39"
    40-49="40-49"
    50-59="50-59"
    60-69="60-69"
    70-79="70-79"
    ;
RUN;
        
PROC SGPANEL DATA=Tmp;
  FORMAT HHIncomeMid IncomeGrp.;
  LABEL HHIncomeMid="Household Income";
  PANELBY HHIncomeMid / LAYOUT=ROWLATTICE;
    REFLINE 0 / AXIS=Y;
  LOESS X=Age Y=Residual / JITTER;
RUN;

PROC SGPANEL DATA=Tmp;
  FORMAT Age AgeGrp.;
  PANELBY Age / LAYOUT=ROWLATTICE;
    REFLINE 0 / AXIS=Y;
  LOESS X=HHIncomeMid Y=Residual / JITTER; 
RUN;
The SGPanel Procedure Loess Pearson Residual 20 30 40 50 60 70 80 Age (yrs) -1 0 1 2 -1 0 1 2 Household Income = 20000-39999 Household Income = 0-19999
The SGPanel Procedure Loess Pearson Residual 20 30 40 50 60 70 80 Age (yrs) -1 0 1 2 -1 0 1 2 Household Income = 60000-79999 Household Income = 40000-59999
The SGPanel Procedure Loess Pearson Residual 20 30 40 50 60 70 80 Age (yrs) -1 0 1 2 -1 0 1 2 Household Income = 100000 Household Income = 80000-99999

The SGPanel Procedure Loess 0 20000 40000 60000 80000 100000 Household Income (Mid-Point of Category) -1 0 1 2 Pearson Residual Age (yrs) = 18-29
The SGPanel Procedure Loess 0 20000 40000 60000 80000 100000 Household Income (Mid-Point of Category) -1 0 1 2 Pearson Residual Age (yrs) = 30-39
The SGPanel Procedure Loess 0 20000 40000 60000 80000 100000 Household Income (Mid-Point of Category) -1 0 1 2 Pearson Residual Age (yrs) = 40-49
The SGPanel Procedure Loess 0 20000 40000 60000 80000 100000 Household Income (Mid-Point of Category) -1 0 1 2 Pearson Residual Age (yrs) = 50-59
The SGPanel Procedure Loess 0 20000 40000 60000 80000 100000 Household Income (Mid-Point of Category) -1 0 1 2 Pearson Residual Age (yrs) = 60-69
The SGPanel Procedure Loess 0 20000 40000 60000 80000 100000 Household Income (Mid-Point of Category) -1 0 1 2 Pearson Residual Age (yrs) = 70-79
The SGPanel Procedure Loess 0 20000 40000 60000 80000 100000 Household Income (Mid-Point of Category) -1 0 1 2 Pearson Residual Age (yrs) = 80

The lack of any consistent patterns in these plots suggests that the correct structural model assumption is valid.

The effective sample size is \(2,592\).

ODS EXCLUDE ALL;
PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=DF4;
  EFFECT AgeCS=SPLINE(Age / 
      NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95));
  EFFECT IncomeCS=SPLINE(HHIncomeMid / 
       NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95));
    
  MODEL Obese(EVENT="Yes") = AgeCS IncomeCS AgeCS*IncomeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=DF5;
  EFFECT AgeCS=SPLINE(Age / 
      NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));
  EFFECT IncomeCS=SPLINE(HHIncomeMid / 
       NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=EQUAL(6));
    
  MODEL Obese(EVENT="Yes") = AgeCS IncomeCS AgeCS*IncomeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=DF6;
  EFFECT AgeCS=SPLINE(Age / 
      NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(2.5 18.3333 34.1667 50 65.8333 81.6667 97.5));
  EFFECT IncomeCS=SPLINE(HHIncomeMid / 
       NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=EQUAL(7));
    
  MODEL Obese(EVENT="Yes") = AgeCS IncomeCS AgeCS*IncomeCS / LINK=LOGIT;
RUN;

DATA DF4;
  SET DF4;
  WHERE Test="Likelihood Ratio";
    
  Model = "Age+Income 4df";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;    
RUN;

DATA DF5;
  SET DF5;
  WHERE Test="Likelihood Ratio";
    
  Model = "Age+Income 5df";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;    
RUN;

DATA DF6;
  SET DF6;
  WHERE Test="Likelihood Ratio";
    
  Model = "Age+Income 6df";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;    
RUN;

DATA ModelComparisons;
  LENGTH Model $20;

  SET DF4 DF5 DF6;
    
  KEEP Model DF AIC BIC;    
RUN;

PROC MEANS DATA=ModelComparisons MAX NOPRINT;
  VAR AIC BIC;
  OUTPUT OUT=MaxIC MAX= / AUTONAME;
RUN;

DATA ModelComparisons;
  IF _N_=1 THEN SET MaxIC;
  SET ModelComparisons;
    
  AIC = AIC_Max - AIC;
  BIC = BIC_Max - BIC;
    
  KEEP Model DF AIC BIC;
RUN;
ODS EXCLUDE NONE;

TITLE "Model Comparisons (AIC/BIC)";
PROC PRINT DATA=ModelComparisons NOOBS;
  VAR Model DF AIC BIC;  
    
  FORMAT AIC BIC 6.1;
RUN;
TITLE;

Model Comparisons (AIC/BIC)

Model DF AIC BIC
Age+Income 4df 24 19.1 0.0
Age+Income 5df 35 10.4 55.7
Age+Income 6df 48 0.0 121.5

The pre-specified model with 4df for age and income is optimal using either AIC or BIC.

Odds Ratios for Age by Income

ESTIMATE statements can be used to obtain the odds ratios comparing various pairs of ages at the same fixed income.

ODS EXCLUDE ALL;
PROC PLM RESTORE=AgeIncome;
  ODS OUTPUT Estimates=E;

  ESTIMATE "Age 30 v 50 for Income $12,500" AgeCS [1,30] [-1, 50] 
      AgeCS*IncomeCS [1, 30 12500] [-1, 50 12500] / CL EXP ALPHA=0.03125;
  ESTIMATE "Age 70 v 50 for Income $12,500" AgeCS [1,70] [-1, 50] 
      AgeCS*IncomeCS [1, 70 12500] [-1, 50 12500] / CL EXP ALPHA=0.03125;
  ESTIMATE "Age 30 v 50 for Income $50,000" AgeCS [1,30] [-1, 50] 
      AgeCS*IncomeCS [1, 30 50000] [-1, 50 50000] / CL EXP ALPHA=0.03125;
  ESTIMATE "Age 70 v 50 for Income $50,000" AgeCS [1,70] [-1, 50] 
      AgeCS*IncomeCS [1, 70 50000] [-1, 50 50000] / CL EXP ALPHA=0.03125;
  ESTIMATE "Age 30 v 50 for Income $87,500" AgeCS [1,30] [-1, 50] 
      AgeCS*IncomeCS [1, 30 87500] [-1, 50 87500] / CL EXP ALPHA=0.03125;
  ESTIMATE "Age 70 v 50 for Income $87,500" AgeCS [1,70] [-1, 50] 
      AgeCS*IncomeCS [1, 70 87500] [-1, 50 87500] / CL EXP ALPHA=0.03125;
RUN;
ODS EXCLUDE NONE;

PROC PRINT DATA=E NOOBS LABEL;
  VAR Label LowerExp ExpEstimate UpperExp;
  LABEL Label='00'x
      LowerExp="Lower CL"
      ExpEstimate="Estimate"
      UpperExp="Upper CL";
  TITLE "Odds Ratios for Age Adjusted for Income";
RUN;
TITLE;

Odds Ratios for Age Adjusted for Income

  Lower CL Estimate Upper CL
Age 30 v 50 for Income $12,500 0.8546 1.2811 1.9205
Age 70 v 50 for Income $12,500 0.9166 1.3183 1.8962
Age 30 v 50 for Income $50,000 1.0367 1.5305 2.2595
Age 70 v 50 for Income $50,000 0.8386 1.1914 1.6928
Age 30 v 50 for Income $87,500 0.2582 0.3884 0.5843
Age 70 v 50 for Income $87,500 0.3667 0.5403 0.7962

The odds ratios for obesity comparing 30-year-olds to 50-year-olds are \((0.85,1.92)\) for a household income of $12,500, \((1.04,2.26)\) for a household income of $50,000, and \((0.26,0.58)\) for a household income of $87,500, while the odds ratios for obesity comparing 70-year-olds to 50-year-olds are \((0.92,1.90)\) for a household income of $12,500, \((0.84,1.69)\) for a household income of $50,000, and \((0.36,0.80)\) for a household income of $87,500.

Odds Ratios for Income by Age

ESTIMATE statements can be used to obtain the odds ratios comparing various pairs of incomes at the same fixed age.

ODS EXCLUDE ALL;
PROC PLM RESTORE=AgeIncome;
  ODS OUTPUT Estimates=E;

  ESTIMATE "Income $12,500 v $50,000 for Age 30" IncomeCS [1, 12500] [-1, 50000] 
      AgeCS*IncomeCS [1, 30 12500] [-1, 30 50000] / CL EXP ALPHA=0.03125;
  ESTIMATE "Income $12,500 v $50,000 for Age 50" IncomeCS [1, 12500] [-1, 50000] 
      AgeCS*IncomeCS [1, 50 12500] [-1, 50 50000] / CL EXP ALPHA=0.03125;
  ESTIMATE "Income $12,500 v $50,000 for Age 70" IncomeCS [1, 12500] [-1, 50000] 
      AgeCS*IncomeCS [1, 70 12500] [-1, 70 50000] / CL EXP ALPHA=0.03125;
  ESTIMATE "Income $87,500 v $50,000 for Age 30" IncomeCS [1, 87500] [-1, 50000] 
      AgeCS*IncomeCS [1, 30 87500] [-1, 30 50000] / CL EXP ALPHA=0.03125;
  ESTIMATE "Income $87,500 v $50,000 for Age 50" IncomeCS [1, 87500] [-1, 50000] 
      AgeCS*IncomeCS [1, 50 87500] [-1, 50 50000] / CL EXP ALPHA=0.03125;
  ESTIMATE "Income $87,500 v $50,000 for Age 70" IncomeCS [1, 87500] [-1, 50000] 
      AgeCS*IncomeCS [1, 70 87500] [-1, 70 50000] / CL EXP ALPHA=0.03125;
RUN;
ODS EXCLUDE NONE;

PROC PRINT DATA=E NOOBS LABEL;
  VAR Label LowerExp ExpEstimate UpperExp;
  LABEL Label='00'x
      LowerExp="Lower CL"
      ExpEstimate="Estimate"
      UpperExp="Upper CL";
  TITLE "Odds Ratios for Age Adjusted for Income";
RUN;
TITLE;

Odds Ratios for Age Adjusted for Income

  Lower CL Estimate Upper CL
Income $12,500 v $50,000 for Age 30 0.6987 0.9923 1.4093
Income $12,500 v $50,000 for Age 50 0.8333 1.1854 1.6863
Income $12,500 v $50,000 for Age 70 0.9333 1.3117 1.8433
Income $87,500 v $50,000 for Age 30 0.3020 0.4525 0.6778
Income $87,500 v $50,000 for Age 50 1.2099 1.7829 2.6274
Income $87,500 v $50,000 for Age 70 0.5256 0.8085 1.2438

The odds ratios for obesity comparing household incomes of $12,500 to $50,000 are \((0.83,1.69)\) for age 30, \((0.93,1.84)\) for age 50, and \((0.97,2.09)\) for age 70, while the odds ratios for obesity comparing household incomes of $87,500 to $50,000 are \((0.30,0.68)\) for age 30, \((1.21,2.63)\) for age 50, and \((0.53,1.24)\) for age 70.

Odds Ratios for Arbitrary Age-Income Combinations

ESTIMATE statements can be used to obtain the odds ratios comparing any two combinations of age and income.

ODS EXCLUDE ALL;
PROC PLM RESTORE=AgeIncome;
  ODS OUTPUT Estimates=E;

  ESTIMATE "Age 30, Income $22,500 v Age 50, Income $87,500" AgeCS [1, 30] [-1, 50]
      IncomeCS [1, 22500] [-1, 87500] 
      AgeCS*IncomeCS [1, 30 22500] [-1, 50 87500] / CL EXP ALPHA=0.03125;
  ESTIMATE "Age 30, Income $50,000 v Age 50, Income $87,500" AgeCS [1, 30] [-1, 50]
      IncomeCS [1, 50000] [-1, 87500] 
      AgeCS*IncomeCS [1, 30 50000] [-1, 50 87500] / CL EXP ALPHA=0.03125;
  ESTIMATE "Age 70, Income $22,500 v Age 50, Income $87,500" AgeCS [1, 70] [-1, 50]
      IncomeCS [1, 22500] [-1, 87500] 
      AgeCS*IncomeCS [1, 70 22500] [-1, 50 87500] / CL EXP ALPHA=0.03125;
  ESTIMATE "Age 70, Income $50,000 v Age 50, Income $87,500" AgeCS [1, 70] [-1, 50]
      IncomeCS [1, 50000] [-1, 87500] 
      AgeCS*IncomeCS [1, 70 50000] [-1, 50 87500] / CL EXP ALPHA=0.03125;
RUN;
ODS EXCLUDE NONE;

PROC PRINT DATA=E NOOBS LABEL;
  VAR Label LowerExp ExpEstimate UpperExp;
  LABEL Label='00'x
      LowerExp="Lower CL"
      ExpEstimate="Estimate"
      UpperExp="Upper CL";
  TITLE "Odds Ratios";
RUN;
TITLE;

Odds Ratios

  Lower CL Estimate Upper CL
Age 30, Income $22,500 v Age 50, Income $87,500 0.5655 0.7824 1.0824
Age 30, Income $50,000 v Age 50, Income $87,500 0.6180 0.8584 1.1925
Age 70, Income $22,500 v Age 50, Income $87,500 0.6792 0.9279 1.2677
Age 70, Income $50,000 v Age 50, Income $87,500 0.4734 0.6682 0.9432

The odds ratios for obesity are \((0.57,1.08)\) comparing 30-year-olds with a household income of $22,500 to 50-year-olds with a household income of $87,500, \((0.62,1.19)\) comparing 30-year-olds with a household income of $50,000 to 50-year-olds with a household income of $87,500, \((0.68,1.27)\) comparing 70-year-olds with a household income of $22,500 to 50-year-olds with a household income of $87,500, and \((0.47,0.94)\) comparing 70-year-olds with a household income of $50,000 to 50-year-olds with a household income of $87,500.

Prevalence by Age and Income

ESTIMATE statements can be used to obtain the prevalence of obesity for any combination of age and income. Note that, in the context of the current model, the prevalence of obesity cannot be obtained for a specific age without also specifying a specific income (and vice versa). There is no single prevalence of obesity for 50 year olds.

ODS EXCLUDE ALL;
PROC PLM RESTORE=AgeIncome;
  ODS OUTPUT Estimates=E;

  ESTIMATE "Age 30, Income $7,500"  Intercept 1 AgeCS [1, 30] IncomeCS [1, 7500] 
      AgeCS*IncomeCS [1, 30 7500] / CL ILINK ALPHA=0.03125;
  ESTIMATE "Age 50, Income $7,500"  Intercept 1 AgeCS [1, 50] IncomeCS [1, 7500] 
       AgeCS*IncomeCS [1, 50 7500] / CL ILINK ALPHA=0.03125;
  ESTIMATE "Age 70, Income $7,500"  Intercept 1 AgeCS [1, 70] IncomeCS [1, 7500] 
      AgeCS*IncomeCS [1, 70 7500] / CL ILINK ALPHA=0.03125;
  ESTIMATE "Age 30, Income $70,000"  Intercept 1 AgeCS [1, 30] IncomeCS [1, 70000] 
      AgeCS*IncomeCS [1, 30 70000] / CL ILINK ALPHA=0.03125;
  ESTIMATE "Age 50, Income $70,000"  Intercept 1 AgeCS [1, 50] IncomeCS [1, 70000] 
      AgeCS*IncomeCS [1, 50 70000] / CL ILINK ALPHA=0.03125;
  ESTIMATE "Age 70, Income $70,000"  Intercept 1 AgeCS [1, 70] IncomeCS [1, 70000] 
      AgeCS*IncomeCS [1, 70 70000] / CL ILINK ALPHA=0.03125;
RUN;
ODS EXCLUDE NONE;

PROC PRINT DATA=E NOOBS LABEL;
  VAR Label LowerMu Mu UpperMu;
  LABEL Label='00'x
      Mu="Estimate"
      LowerMu="Lower CL"
      UpperMu="Upper CL";
  TITLE "Prevalence of Obesity (Additive)";
RUN;
TITLE;

Prevalence of Obesity (Additive)

  Lower CL Estimate Upper CL
Age 30, Income $7,500 0.3631 0.4445 0.5290
Age 50, Income $7,500 0.2740 0.3494 0.4332
Age 70, Income $7,500 0.3521 0.4302 0.5120
Age 30, Income $70,000 0.2721 0.3228 0.3781
Age 50, Income $70,000 0.4001 0.4560 0.5130
Age 70, Income $70,000 0.2833 0.3372 0.3957

The estimated prevalence of obesity is \((0.36,0.53)\) for 30-year-olds with a household income of $7,500, \((0.27,0.43)\) for 50-year-olds with a household income of $7,500, and \((0.35,0.51)\) for 70-year-olds with a household income of $7,500; it is \((0.27,0.38)\) for 30-year-olds with a household income of $70,00, \((0.40,0.51)\) for 50-year-olds with a household income of $70,000, and \((0.28,0.40)\) for 70-year-olds with a household income of $70,000.

The estimated regression functions (log odds of obesity as a function of age and race) from the additive model can be plotted using PROC SGPLOT.

ODS EXCLUDE ALL;  
DATA AllAgesAllIncomes;
  DO HHIncomeMid = 2500 TO 87500 BY 2500;
    DO Age = 18 TO 79 BY 1;
      OUTPUT;
    END;
  END;
RUN;

PROC PLM RESTORE=AgeIncome;
  SCORE DATA=AllAgesAllIncomes OUT=Fitted Predicted LCLM=Lower_Logit UCLM=Upper_Logit / ALPHA=0.03125;
RUN;
      
DATA Fitted;
  SET Fitted;
      
  Lower_Prob = 1/(1+EXP(-Lower_Logit));
  Upper_Prob = 1/(1+EXP(-Upper_Logit));
RUN;      
ODS EXCLUDE NONE;

PROC SGPLOT DATA=Fitted;
  STYLEATTRS DATACOLORS=(Cx1b9e77 Cxd95f02 Cx7570b3 Cxe7298a Cx66a61e Cxe6ab02 Cxa6761d Cx666666);
  BAND X=Age Lower=Lower_Logit Upper=Upper_Logit /
      GROUP=HHIncomeMid TRANSPARENCY=0.7;
  YAXIS LABEL="Logit (Log Odds) of Obesity";
  WHERE HHIncomeMid IN (10000 30000 50000 70000);
RUN;

PROC SGPLOT DATA=Fitted;
  STYLEATTRS DATACOLORS=(Cx1b9e77 Cxd95f02 Cx7570b3 Cxe7298a Cx66a61e Cxe6ab02 Cxa6761d Cx666666);
  BAND X=HHIncomeMid Lower=Lower_Logit Upper=Upper_Logit /
      GROUP=Age TRANSPARENCY=0.7;
  YAXIS LABEL="Logit (Log Odds) of Obesity";
  WHERE Age IN (30 40 50 60 70);
RUN;

Alternatively, the estimated prevalence of obesity can also be plotted using PROC SGPLOT.

PROC SGPLOT DATA=Fitted;
  STYLEATTRS DATACOLORS=(Cx1b9e77 Cxd95f02 Cx7570b3 Cxe7298a Cx66a61e Cxe6ab02 Cxa6761d Cx666666);
  BAND X=Age Lower=Lower_Prob Upper=Upper_Prob /
      GROUP=HHIncomeMid TRANSPARENCY=0.7;
  YAXIS LABEL="Prevance of Obesity";
  WHERE HHIncomeMid IN (10000 30000 50000 70000);
RUN;

PROC SGPLOT DATA=Fitted;
  STYLEATTRS DATACOLORS=(Cx1b9e77 Cxd95f02 Cx7570b3 Cxe7298a Cx66a61e Cxe6ab02 Cxa6761d Cx666666);
  BAND X=HHIncomeMid Lower=Lower_Prob Upper=Upper_Prob /
      GROUP=Age TRANSPARENCY=0.7;
  YAXIS LABEL="Prevalence of Obesity";
  WHERE Age IN (30 40 50 60 70);
RUN;

Hypothesis Tests

Likelihood ratio tests can be used to assess the evidence for (1) any effect of age on prevalence of obesity after accounting for income, (2) any effect of income on prevalence of obesity after accounting for age, (3) non-linearity of the effect(s) of age, (4) non-linearity of the effect(s) of income, and (5) interaction between age and income. In general, one should not reduce the complexity of the model (drop variables) based on the results of a hypothesis test.

ODS EXCLUDE ALL;
PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=Full;
  EFFECT AgeCS=SPLINE(Age / NATURALCUBIC BASIS=TPF(NOINT) DEGREE=3 
                            KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95) DETAILS);
  EFFECT IncomeCS=SPLINE(HHIncomeMid / NATURALCUBIC BASIS=TPF(NOINT) DEGREE=3 
                            KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95) DETAILS);
    
  MODEL Obese(EVENT="Yes") = AgeCS IncomeCS AgeCS*IncomeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=Additive;
  EFFECT AgeCS=SPLINE(Age / NATURALCUBIC BASIS=TPF(NOINT) DEGREE=3 
                            KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95) DETAILS);
  EFFECT IncomeCS=SPLINE(HHIncomeMid / NATURALCUBIC BASIS=TPF(NOINT) DEGREE=3 
                            KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95) DETAILS);
    
  MODEL Obese(EVENT="Yes") = AgeCS IncomeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=LinearAge;
  EFFECT IncomeCS=SPLINE(HHIncomeMid / NATURALCUBIC BASIS=TPF(NOINT) DEGREE=3 
                            KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95) DETAILS);
    
  MODEL Obese(EVENT="Yes") = Age IncomeCS Age*IncomeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=LinearIncome;
  EFFECT AgeCS=SPLINE(Age / NATURALCUBIC BASIS=TPF(NOINT) DEGREE=3 
                            KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95) DETAILS);

  MODEL Obese(EVENT="Yes") = AgeCS HHIncomeMid AgeCS*HHIncomeMid / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=AgeOnly;
  EFFECT AgeCS=SPLINE(Age / NATURALCUBIC BASIS=TPF(NOINT) DEGREE=3 
                            KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95) DETAILS);

  MODEL Obese(EVENT="Yes") = AgeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=IncomeOnly;
  EFFECT IncomeCS=SPLINE(HHIncomeMid / NATURALCUBIC BASIS=TPF(NOINT) DEGREE=3 
                            KNOTMETHOD=PERCENTILELIST(5 27.5 50 72.5 95) DETAILS);
    
  MODEL Obese(EVENT="Yes") = IncomeCS / LINK=LOGIT;
RUN;

DATA Full;
  SET Full;
  WHERE Test="Likelihood Ratio";
  RENAME ChiSq=ChiSq_Full DF=DF_Full;
RUN;

DATA Additive;
  LENGTH Model $30 Test $30;
  SET Additive;
  WHERE Test="Likelihood Ratio";
    
  Model = "Additive";
  Test = "Interaction";
    
  KEEP Model Test ChiSq DF;
RUN;

DATA LinearAge;
  LENGTH Model $30 Test $30;
  SET LinearAge;
  WHERE Test="Likelihood Ratio";
    
  Model = "Income+Linear Age";
  Test = "Non-Linear Age";
    
  KEEP Model Test ChiSq DF;
RUN;

DATA LinearIncome;
  LENGTH Model $30 Test $30;
  SET LinearIncome;
  WHERE Test="Likelihood Ratio";
    
  Model = "Age+Linear Income";
  Test = "Non-Linear Income";
    
  KEEP Model Test ChiSq DF;
RUN;

DATA IncomeOnly;
  LENGTH Model $30 Test $30;
  SET IncomeOnly;
  WHERE Test="Likelihood Ratio";
    
  Model = "Income Only";
  Test = "Age";
    
  KEEP Model Test ChiSq DF;
RUN;

DATA AgeOnly;
  LENGTH Model $30 Test $30;
  SET AgeOnly;
  WHERE Test="Likelihood Ratio";
    
  Model = "Age Only";
  Test = "Income";
    
  KEEP Model Test ChiSq DF;
RUN;

DATA LRTests;
  LENGTH Model $30 Test $30;

  IF _N_ = 1 THEN SET Full;
  SET Additive LinearAge LinearIncome AgeOnly IncomeOnly;
    
  Chisq = Chisq_Full-Chisq;
  DF = DF_Full - DF;
  ProbChiSq = EXP(LOGSDF("CHISQUARE",Chisq,DF));
  sValue = -LOG(ProbChiSq)/LOG(2);
    
  KEEP Test DF ChiSq ProbChiSq sValue;    
RUN;
ODS EXCLUDE NONE;

TITLE "Likelihood Ratio Tests";
PROC PRINT DATA=LRTests NOOBS;
  VAR Test DF ChiSq ProbChiSq sValue;  

  FORMAT sValue 6.2;
  FORMAT ChiSq 6.2;
RUN;
TITLE;

Likelihood Ratio Tests

Test DF ChiSq ProbChiSq sValue
Interaction 16 45.34 0.0001 12.99
Non-Linear Age 15 103.51 <.0001 48.34
Non-Linear Income 15 46.00 <.0001 14.20
Income 20 116.30 <.0001 49.37
Age 20 112.77 <.0001 47.21

There is very strong evidence of an interaction between age and income in their relationship with the log odds of obesity \((13.0)\).

There is extremely strong evidence that the prevalence of obesity varies by age across subjects with the same household income \((p \approx 0, s=47.2)\); there is extremely strong evidence that the relationship between age and log odds of obesity is non-linear \((p \approx 0, s=48.3)\).

There is very strong evidence that the prevalence of obesity varies by household income across subjects of the same age \((p \approx 0, s=49.4)\); there is also very strong evidence that the relationship between income and log odds of obesity is non-linear \(p \approx 0, s=14.2)\).